Expose internal scheduling metadata#2873
Conversation
Return compact build-chain metadata from snapshot creation so API placement can later use real build ancestry for resume affinity.
PR SummaryMedium Risk Overview A shared Sandbox create, pause, and checkpoint responses now include this metadata; pause returns a typed response instead of empty. List cached builds merges template TTL/scheduling info with per-build memfile/rootfs cached vs total bytes from the diff store, using true logical size where chunkers only partially fetched data. Completed template builds attach the same shape on Reviewed by Cursor Bugbot for commit d48f099. Bugbot is set up for automated code reviews on this repo. Configure here. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Code Review
In update_snapshot_scheduling_metadata.sql, using to_jsonb on a text parameter stores it as an escaped string scalar rather than a JSONB object, so the parameter should be cast directly to jsonb. In sandboxes.go, accessing res.SchedulingMetadata will cause a compilation error because the snapshotResult struct lacks this field. Additionally, NewSnapshotSchedulingMetadata in snapshot.go is missing nil checks for the Mapping fields of the headers, which can lead to a nil pointer dereference panic.
Populate pause/checkpoint responses from snapshot results and store scheduling metadata as JSONB.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Pause fails after orchestrator pause
- Added failSnapshotBuild call when updateSnapshotSchedulingMetadata fails after successful pause RPC to properly mark build status as failed.
- ✅ Fixed: Metadata error finishes snapshot success
- Added finish(err) call before returning when updateSnapshotSchedulingMetadata fails to prevent deferred finish(nil) from incorrectly marking snapshot as success.
Or push these changes by commenting:
@cursor push caa2284235
Preview (caa2284235)
diff --git a/packages/api/internal/orchestrator/pause_instance.go b/packages/api/internal/orchestrator/pause_instance.go
--- a/packages/api/internal/orchestrator/pause_instance.go
+++ b/packages/api/internal/orchestrator/pause_instance.go
@@ -52,6 +52,8 @@
}
if schedulingMetadata != nil {
if err := o.updateSnapshotSchedulingMetadata(ctx, sbx.SandboxID, schedulingMetadata); err != nil {
+ o.failSnapshotBuild(ctx, result.BuildID, err)
+
return err
}
}
diff --git a/packages/api/internal/orchestrator/snapshot_template.go b/packages/api/internal/orchestrator/snapshot_template.go
--- a/packages/api/internal/orchestrator/snapshot_template.go
+++ b/packages/api/internal/orchestrator/snapshot_template.go
@@ -106,6 +106,8 @@
if err := o.updateSnapshotSchedulingMetadata(ctx, sbx.SandboxID, metadata); err != nil {
o.failSnapshotBuild(ctx, upsertResult.BuildID, err)
+ finish(err)
+
return SnapshotTemplateResult{}, err
}
}You can send follow-ups to the cloud agent here.
Do not fail a successful pause or checkpoint if persisting optional scheduling metadata fails.
This comment has been minimized.
This comment has been minimized.
Use slices.SortFunc and remove an unnecessary conversion in metadata generation.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Limit one drops parent build
- Added condition to prevent base build from overwriting parent build when chain limit is 1 and last element is the parent.
Or push these changes by commenting:
@cursor push 527fa53cf5
Preview (527fa53cf5)
diff --git a/packages/orchestrator/pkg/sandbox/snapshot.go b/packages/orchestrator/pkg/sandbox/snapshot.go
--- a/packages/orchestrator/pkg/sandbox/snapshot.go
+++ b/packages/orchestrator/pkg/sandbox/snapshot.go
@@ -139,7 +139,7 @@
break
}
}
- if !hasBase {
+ if !hasBase && chain[len(chain)-1].buildID != parentBuildID {
if c, ok := contributions[baseBuildID]; ok {
chain[len(chain)-1] = *c
} else {You can send follow-ups to the cloud agent here.
Avoid replacing the only chain entry with the base build because base build is already stored separately.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Nested metadata breaks JSONBStringMap scans
- ✅ Fixed: Upsert clears scheduling metadata key
- Changed create_new_snapshot.sql conflict handler to use snapshots.metadata || excluded.metadata instead of excluded.metadata, preserving existing metadata keys including snapshot_scheduling_metadata during upsert.
Or push these changes by commenting:
@cursor push c71543e849
Preview (c71543e849)
diff --git a/packages/db/queries/create_new_snapshot.sql.go b/packages/db/queries/create_new_snapshot.sql.go
--- a/packages/db/queries/create_new_snapshot.sql.go
+++ b/packages/db/queries/create_new_snapshot.sql.go
@@ -55,7 +55,7 @@
now()
)
ON CONFLICT (sandbox_id) DO UPDATE SET
- metadata = excluded.metadata,
+ metadata = snapshots.metadata || excluded.metadata,
sandbox_started_at = excluded.sandbox_started_at,
allow_internet_access = COALESCE(excluded.allow_internet_access, snapshots.allow_internet_access),
origin_node_id = excluded.origin_node_id,
diff --git a/packages/db/queries/snapshots/create_new_snapshot.sql b/packages/db/queries/snapshots/create_new_snapshot.sql
--- a/packages/db/queries/snapshots/create_new_snapshot.sql
+++ b/packages/db/queries/snapshots/create_new_snapshot.sql
@@ -41,7 +41,7 @@
now()
)
ON CONFLICT (sandbox_id) DO UPDATE SET
- metadata = excluded.metadata,
+ metadata = snapshots.metadata || excluded.metadata,
sandbox_started_at = excluded.sandbox_started_at,
allow_internet_access = COALESCE(excluded.allow_internet_access, snapshots.allow_internet_access),
origin_node_id = excluded.origin_node_id,
diff --git a/packages/db/queries/snapshots/update_snapshot_scheduling_metadata.sql b/packages/db/queries/snapshots/update_snapshot_scheduling_metadata.sql
--- a/packages/db/queries/snapshots/update_snapshot_scheduling_metadata.sql
+++ b/packages/db/queries/snapshots/update_snapshot_scheduling_metadata.sql
@@ -3,7 +3,7 @@
SET metadata = jsonb_set(
metadata,
'{snapshot_scheduling_metadata}',
- (@metadata::text)::jsonb,
+ to_jsonb(@metadata::text),
true
)
WHERE sandbox_id = @sandbox_id;
diff --git a/packages/db/queries/update_snapshot_scheduling_metadata.sql.go b/packages/db/queries/update_snapshot_scheduling_metadata.sql.go
--- a/packages/db/queries/update_snapshot_scheduling_metadata.sql.go
+++ b/packages/db/queries/update_snapshot_scheduling_metadata.sql.go
@@ -1,6 +1,6 @@
// Code generated by sqlc. DO NOT EDIT.
// versions:
-// sqlc v1.31.1
+// sqlc v1.29.0
// source: update_snapshot_scheduling_metadata.sql
package queries
@@ -14,7 +14,7 @@
SET metadata = jsonb_set(
metadata,
'{snapshot_scheduling_metadata}',
- ($1::text)::jsonb,
+ to_jsonb($1::text),
true
)
WHERE sandbox_id = $2You can send follow-ups to the cloud agent here.
Store scheduling metadata as a string value in the existing metadata map and keep it across snapshot upserts until refreshed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 055b4e0d2a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Generation incremented twice
- Removed the duplicate generation increment from FromHeaders and moved it to only apply in NewSnapshotSchedulingMetadata for pre-snapshot headers.
Or push these changes by commenting:
@cursor push 9d1cfd4f72
Preview (9d1cfd4f72)
diff --git a/packages/orchestrator/pkg/sandbox/snapshot.go b/packages/orchestrator/pkg/sandbox/snapshot.go
--- a/packages/orchestrator/pkg/sandbox/snapshot.go
+++ b/packages/orchestrator/pkg/sandbox/snapshot.go
@@ -58,5 +58,9 @@
}
func NewSnapshotSchedulingMetadata(memfileHeader, rootfsHeader *header.Header) *orchestrator.SchedulingMetadata {
- return scheduling.FromHeaders(memfileHeader, rootfsHeader)
+ metadata := scheduling.FromHeaders(memfileHeader, rootfsHeader)
+ if metadata != nil {
+ metadata.Generation++
+ }
+ return metadata
}
diff --git a/packages/orchestrator/pkg/scheduling/metadata.go b/packages/orchestrator/pkg/scheduling/metadata.go
--- a/packages/orchestrator/pkg/scheduling/metadata.go
+++ b/packages/orchestrator/pkg/scheduling/metadata.go
@@ -93,7 +93,7 @@
res := &orchestrator.SchedulingMetadata{
BaseBuildId: baseBuildID.String(),
- Generation: max(memfileHeader.Metadata.Generation, rootfsHeader.Metadata.Generation) + 1,
+ Generation: max(memfileHeader.Metadata.Generation, rootfsHeader.Metadata.Generation),
ChainBuildIds: make([]string, 0, len(chain)),
MemfileLogicalBytes: make([]uint64, 0, len(chain)),
MemfileChunkCounts: make([]uint32, 0, len(chain)),You can send follow-ups to the cloud agent here.
|
Bugbot Autofix prepared fixes for both issues found in the latest run.
Or push these changes by commenting: Preview (526c21400a)diff --git a/packages/orchestrator/pkg/sandbox/sandbox.go b/packages/orchestrator/pkg/sandbox/sandbox.go
--- a/packages/orchestrator/pkg/sandbox/sandbox.go
+++ b/packages/orchestrator/pkg/sandbox/sandbox.go
@@ -1185,7 +1185,7 @@
}
cleanup.AddNoContext(ctx, rootfsDiff.Close)
- schedulingMetadata := NewSnapshotSchedulingMetadata(originalMemfile.Header(), originalRootfs.Header())
+ schedulingMetadata := NewSnapshotSchedulingMetadata(ctx, memfileDiffHeader, NewResolvedDiffHeader(rootfsHeader))
metadataFileLink := template.NewLocalFileLink(cachePaths.CacheMetadata())
cleanup.AddNoContext(ctx, metadataFileLink.Close)
diff --git a/packages/orchestrator/pkg/sandbox/snapshot.go b/packages/orchestrator/pkg/sandbox/snapshot.go
--- a/packages/orchestrator/pkg/sandbox/snapshot.go
+++ b/packages/orchestrator/pkg/sandbox/snapshot.go
@@ -57,6 +57,12 @@
return nil
}
-func NewSnapshotSchedulingMetadata(memfileHeader, rootfsHeader *header.Header) *orchestrator.SchedulingMetadata {
+func NewSnapshotSchedulingMetadata(ctx context.Context, memfileDiffHeader, rootfsDiffHeader *DiffHeader) *orchestrator.SchedulingMetadata {
+ memfileHeader, memfileErr := memfileDiffHeader.WaitWithContext(ctx)
+ rootfsHeader, rootfsErr := rootfsDiffHeader.WaitWithContext(ctx)
+ if memfileErr != nil || rootfsErr != nil {
+ return nil
+ }
+
return scheduling.FromHeaders(memfileHeader, rootfsHeader)
}
diff --git a/packages/orchestrator/pkg/sandbox/template/storage_template.go b/packages/orchestrator/pkg/sandbox/template/storage_template.go
--- a/packages/orchestrator/pkg/sandbox/template/storage_template.go
+++ b/packages/orchestrator/pkg/sandbox/template/storage_template.go
@@ -272,9 +272,9 @@
return t.paths
}
-func (t *storageTemplate) SchedulingMetadata() *orchestrator.SchedulingMetadata {
- memfileHeader, memfileErr := t.memfileHeader.Result()
- rootfsHeader, rootfsErr := t.rootfsHeader.Result()
+func (t *storageTemplate) SchedulingMetadata(ctx context.Context) *orchestrator.SchedulingMetadata {
+ memfileHeader, memfileErr := t.memfileHeader.WaitWithContext(ctx)
+ rootfsHeader, rootfsErr := t.rootfsHeader.WaitWithContext(ctx)
if memfileErr != nil || rootfsErr != nil {
return nil
}
diff --git a/packages/orchestrator/pkg/server/template_cache.go b/packages/orchestrator/pkg/server/template_cache.go
--- a/packages/orchestrator/pkg/server/template_cache.go
+++ b/packages/orchestrator/pkg/server/template_cache.go
@@ -20,9 +20,9 @@
for key, item := range s.templateCache.Items() {
var metadata *orchestrator.SchedulingMetadata
if provider, ok := item.Value().(interface {
- SchedulingMetadata() *orchestrator.SchedulingMetadata
+ SchedulingMetadata(context.Context) *orchestrator.SchedulingMetadata
}); ok {
- metadata = provider.SchedulingMetadata()
+ metadata = provider.SchedulingMetadata(ctx)
}
builds = append(builds, &orchestrator.CachedBuildInfo{
BuildId: key,
diff --git a/packages/orchestrator/pkg/template/build/builder.go b/packages/orchestrator/pkg/template/build/builder.go
--- a/packages/orchestrator/pkg/template/build/builder.go
+++ b/packages/orchestrator/pkg/template/build/builder.go
@@ -390,23 +390,23 @@
KernelVersion: bc.Config.KernelVersion,
FirecrackerVersion: bc.Config.FirecrackerVersion,
RootfsSizeMB: units.BytesToMB(int64(rootfsSize)),
- SchedulingMetadata: templateSchedulingMetadata(builder.templateCache, lastLayerResult.Metadata.Template.BuildID),
+ SchedulingMetadata: templateSchedulingMetadata(ctx, builder.templateCache, lastLayerResult.Metadata.Template.BuildID),
}, nil
}
-func templateSchedulingMetadata(cache *sbxtemplate.Cache, buildID string) *templatemanager.TemplateSchedulingMetadata {
+func templateSchedulingMetadata(ctx context.Context, cache *sbxtemplate.Cache, buildID string) *templatemanager.TemplateSchedulingMetadata {
t, ok := cache.GetCachedTemplate(buildID)
if !ok {
return nil
}
provider, ok := t.(interface {
- SchedulingMetadata() *orchestratorgrpc.SchedulingMetadata
+ SchedulingMetadata(context.Context) *orchestratorgrpc.SchedulingMetadata
})
if !ok {
return nil
}
- return toTemplateSchedulingMetadata(provider.SchedulingMetadata())
+ return toTemplateSchedulingMetadata(provider.SchedulingMetadata(ctx))
}
func toTemplateSchedulingMetadata(m *orchestratorgrpc.SchedulingMetadata) *templatemanager.TemplateSchedulingMetadata {You can send follow-ups to the cloud agent here. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d21605098a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Scheduling metadata uses unset headers
- Updated Fetch() to propagate loaded headers back to memfileHeader and rootfsHeader SetOnce fields when initial headers are nil, ensuring SchedulingMetadata() returns correct values.
Or push these changes by commenting:
@cursor push e86ca15145
Preview (e86ca15145)
diff --git a/packages/orchestrator/pkg/sandbox/template/storage_template.go b/packages/orchestrator/pkg/sandbox/template/storage_template.go
--- a/packages/orchestrator/pkg/sandbox/template/storage_template.go
+++ b/packages/orchestrator/pkg/sandbox/template/storage_template.go
@@ -206,6 +206,12 @@
return nil
}
+ if memHdr == nil {
+ if err := t.memfileHeader.SetValue(memfileStorage.Header()); err != nil {
+ return fmt.Errorf("failed to update memfile header: %w", err)
+ }
+ }
+
if err := t.memfile.SetValue(memfileStorage); err != nil {
return fmt.Errorf("failed to set memfile value: %w", err)
}
@@ -243,6 +249,12 @@
return nil
}
+ if rootHdr == nil {
+ if err := t.rootfsHeader.SetValue(rootfsStorage.Header()); err != nil {
+ return fmt.Errorf("failed to update rootfs header: %w", err)
+ }
+ }
+
if err := t.rootfs.SetValue(rootfsStorage); err != nil {
return fmt.Errorf("failed to set rootfs value: %w", err)
}You can send follow-ups to the cloud agent here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: ListCachedBuilds omits proto fields
- Restored expiration_time and scheduling_metadata fields by combining data from both template cache and diff store, ensuring all cached builds are listed with complete information.
Or push these changes by commenting:
@cursor push 425e3ebf86
Preview (425e3ebf86)
diff --git a/packages/orchestrator/pkg/server/template_cache.go b/packages/orchestrator/pkg/server/template_cache.go
--- a/packages/orchestrator/pkg/server/template_cache.go
+++ b/packages/orchestrator/pkg/server/template_cache.go
@@ -7,24 +7,47 @@
"github.com/e2b-dev/infra/packages/shared/pkg/grpc/orchestrator"
"google.golang.org/protobuf/types/known/emptypb"
+ "google.golang.org/protobuf/types/known/timestamppb"
)
func (s *Server) ListCachedBuilds(ctx context.Context, _ *emptypb.Empty) (*orchestrator.SandboxListCachedBuildsResponse, error) {
_, childSpan := tracer.Start(ctx, "list-cached-templates")
defer childSpan.End()
- var builds []*orchestrator.CachedBuildInfo
+ buildMap := make(map[string]*orchestrator.CachedBuildInfo)
+ for key, item := range s.templateCache.Items() {
+ info := &orchestrator.CachedBuildInfo{
+ BuildId: key,
+ ExpirationTime: timestamppb.New(item.ExpiresAt()),
+ }
+ if provider, ok := item.Value().(interface {
+ SchedulingMetadata() *orchestrator.SchedulingMetadata
+ }); ok {
+ info.SchedulingMetadata = provider.SchedulingMetadata()
+ }
+ buildMap[key] = info
+ }
+
for _, stats := range s.templateCache.CachedBuildStats(ctx) {
- builds = append(builds, &orchestrator.CachedBuildInfo{
- BuildId: stats.BuildID,
- MemfileCachedBytes: stats.MemfileCachedBytes,
- MemfileCachedChunks: stats.MemfileCachedChunks,
- RootfsCachedBytes: stats.RootfsCachedBytes,
- RootfsCachedChunks: stats.RootfsCachedChunks,
- })
+ info := buildMap[stats.BuildID]
+ if info == nil {
+ info = &orchestrator.CachedBuildInfo{
+ BuildId: stats.BuildID,
+ }
+ buildMap[stats.BuildID] = info
+ }
+ info.MemfileCachedBytes = stats.MemfileCachedBytes
+ info.MemfileCachedChunks = stats.MemfileCachedChunks
+ info.RootfsCachedBytes = stats.RootfsCachedBytes
+ info.RootfsCachedChunks = stats.RootfsCachedChunks
}
+ builds := make([]*orchestrator.CachedBuildInfo, 0, len(buildMap))
+ for _, info := range buildMap {
+ builds = append(builds, info)
+ }
+
return &orchestrator.SandboxListCachedBuildsResponse{
Builds: builds,
}, nilYou can send follow-ups to the cloud agent here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0402551cdf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8779771c62
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a28185d211
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Create scheduling metadata often nil
- Modified GetTemplate to use unresolved SetOnce instances and Fetch to load headers from storage, ensuring SchedulingMetadata returns actual header data instead of nil.
Or push these changes by commenting:
@cursor push 74c45d8201
Preview (74c45d8201)
diff --git a/packages/orchestrator/pkg/sandbox/template/cache.go b/packages/orchestrator/pkg/sandbox/template/cache.go
--- a/packages/orchestrator/pkg/sandbox/template/cache.go
+++ b/packages/orchestrator/pkg/sandbox/template/cache.go
@@ -193,8 +193,8 @@
storageTemplate, err := newTemplateFromStorage(
c.config.BuilderConfig,
buildID,
- resolvedHeader(nil),
- resolvedHeader(nil),
+ utils.NewSetOnce[*header.Header](),
+ utils.NewSetOnce[*header.Header](),
persistence,
c.blockMetrics,
nil,
diff --git a/packages/orchestrator/pkg/sandbox/template/storage_template.go b/packages/orchestrator/pkg/sandbox/template/storage_template.go
--- a/packages/orchestrator/pkg/sandbox/template/storage_template.go
+++ b/packages/orchestrator/pkg/sandbox/template/storage_template.go
@@ -176,14 +176,28 @@
})
wg.Go(func() error {
- memHdr, hdrErr := t.memfileHeader.WaitWithContext(ctx)
+ memHdr, hdrErr := t.memfileHeader.Result()
if hdrErr != nil {
- errMsg := fmt.Errorf("failed to resolve memfile header: %w", hdrErr)
- if err := t.memfile.SetError(errMsg); err != nil {
- return fmt.Errorf("failed to set memfile error: %w", errors.Join(errMsg, err))
+ if _, notSet := hdrErr.(utils.NotSetError); notSet {
+ // Header not set yet, load it from storage
+ loadedHdr, loadErr := header.LoadHeader(ctx, t.persistence, t.paths.MemfileHeader())
+ if loadErr != nil && !errors.Is(loadErr, storage.ErrObjectNotExist) {
+ errMsg := fmt.Errorf("failed to load memfile header: %w", loadErr)
+ if err := t.memfile.SetError(errMsg); err != nil {
+ return fmt.Errorf("failed to set memfile error: %w", errors.Join(errMsg, err))
+ }
+ return nil
+ }
+ // Set the header (may be nil if not found)
+ _ = t.memfileHeader.SetValue(loadedHdr)
+ memHdr = loadedHdr
+ } else {
+ errMsg := fmt.Errorf("failed to resolve memfile header: %w", hdrErr)
+ if err := t.memfile.SetError(errMsg); err != nil {
+ return fmt.Errorf("failed to set memfile error: %w", errors.Join(errMsg, err))
+ }
+ return nil
}
-
- return nil
}
memfileStorage, memfileErr := NewStorage(
@@ -214,14 +228,28 @@
})
wg.Go(func() error {
- rootHdr, hdrErr := t.rootfsHeader.WaitWithContext(ctx)
+ rootHdr, hdrErr := t.rootfsHeader.Result()
if hdrErr != nil {
- errMsg := fmt.Errorf("failed to resolve rootfs header: %w", hdrErr)
- if err := t.rootfs.SetError(errMsg); err != nil {
- return fmt.Errorf("failed to set rootfs error: %w", errors.Join(errMsg, err))
+ if _, notSet := hdrErr.(utils.NotSetError); notSet {
+ // Header not set yet, load it from storage
+ loadedHdr, loadErr := header.LoadHeader(ctx, t.persistence, t.paths.RootfsHeader())
+ if loadErr != nil && !errors.Is(loadErr, storage.ErrObjectNotExist) {
+ errMsg := fmt.Errorf("failed to load rootfs header: %w", loadErr)
+ if err := t.rootfs.SetError(errMsg); err != nil {
+ return fmt.Errorf("failed to set rootfs error: %w", errors.Join(errMsg, err))
+ }
+ return nil
+ }
+ // Set the header (may be nil if not found)
+ _ = t.rootfsHeader.SetValue(loadedHdr)
+ rootHdr = loadedHdr
+ } else {
+ errMsg := fmt.Errorf("failed to resolve rootfs header: %w", hdrErr)
+ if err := t.rootfs.SetError(errMsg); err != nil {
+ return fmt.Errorf("failed to set rootfs error: %w", errors.Join(errMsg, err))
+ }
+ return nil
}
-
- return nil
}
rootfsStorage, rootfsErr := NewStorage(You can send follow-ups to the cloud agent here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7abe525eae
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Derive scheduling metadata from resolved device/diff headers so pause, create, cached-build listing and template builds report the actual build, its own diff contributions, and the on-disk generation. Report logical vs on-disk byte stats and merge template cache entries into ListCachedBuilds.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Uninitialized storage diff stats panic
- Added nil check in StorageDiff.FileSize to return error instead of dereferencing nil chunker when called on uninitialized diffs.
- ✅ Fixed: Pause hides header wait failures
- Changed NewSnapshotSchedulingMetadata to return error when header wait fails, causing pause/checkpoint operations to fail instead of silently omitting metadata.
Or push these changes by commenting:
@cursor push 5691fa6c77
Preview (5691fa6c77)
diff --git a/packages/orchestrator/pkg/sandbox/build/storage_diff.go b/packages/orchestrator/pkg/sandbox/build/storage_diff.go
--- a/packages/orchestrator/pkg/sandbox/build/storage_diff.go
+++ b/packages/orchestrator/pkg/sandbox/build/storage_diff.go
@@ -128,6 +128,10 @@
}
func (b *StorageDiff) FileSize(ctx context.Context) (int64, error) {
+ if b.chunker == nil {
+ return 0, fmt.Errorf("chunker not initialized")
+ }
+
return b.chunker.FileSize(ctx)
}
diff --git a/packages/orchestrator/pkg/sandbox/sandbox.go b/packages/orchestrator/pkg/sandbox/sandbox.go
--- a/packages/orchestrator/pkg/sandbox/sandbox.go
+++ b/packages/orchestrator/pkg/sandbox/sandbox.go
@@ -1186,7 +1186,10 @@
cleanup.AddNoContext(ctx, rootfsDiff.Close)
rootfsDiffHeader := NewResolvedDiffHeader(rootfsHeader)
- schedulingMetadata := NewSnapshotSchedulingMetadata(ctx, memfileDiffHeader, rootfsDiffHeader)
+ schedulingMetadata, err := NewSnapshotSchedulingMetadata(ctx, memfileDiffHeader, rootfsDiffHeader)
+ if err != nil {
+ return nil, fmt.Errorf("failed to resolve scheduling metadata: %w", err)
+ }
metadataFileLink := template.NewLocalFileLink(cachePaths.CacheMetadata())
cleanup.AddNoContext(ctx, metadataFileLink.Close)
diff --git a/packages/orchestrator/pkg/sandbox/snapshot.go b/packages/orchestrator/pkg/sandbox/snapshot.go
--- a/packages/orchestrator/pkg/sandbox/snapshot.go
+++ b/packages/orchestrator/pkg/sandbox/snapshot.go
@@ -59,12 +59,16 @@
// NewSnapshotSchedulingMetadata derives the metadata from the new snapshot's
// diff headers so the freshly created build and its contributions are included.
-func NewSnapshotSchedulingMetadata(ctx context.Context, memfileDiffHeader, rootfsDiffHeader *DiffHeader) *orchestrator.SchedulingMetadata {
+func NewSnapshotSchedulingMetadata(ctx context.Context, memfileDiffHeader, rootfsDiffHeader *DiffHeader) (*orchestrator.SchedulingMetadata, error) {
memfileHeader, memfileErr := memfileDiffHeader.WaitWithContext(ctx)
+ if memfileErr != nil {
+ return nil, fmt.Errorf("memfile header wait failed: %w", memfileErr)
+ }
+
rootfsHeader, rootfsErr := rootfsDiffHeader.WaitWithContext(ctx)
- if memfileErr != nil || rootfsErr != nil {
- return nil
+ if rootfsErr != nil {
+ return nil, fmt.Errorf("rootfs header wait failed: %w", rootfsErr)
}
- return scheduling.FromHeaders(memfileHeader, rootfsHeader)
+ return scheduling.FromHeaders(memfileHeader, rootfsHeader), nil
}You can send follow-ups to the cloud agent here.
…adata FileSize panicked on diffs cached before Init (nil chunker), matching the nil guards already on Size/IsCached/Close. NewSnapshotSchedulingMetadata now warns when a diff header fails to resolve instead of silently returning nil.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues. You can view the agent here.
Reviewed by Cursor Bugbot for commit d48f099. Configure here.
| return nil | ||
| } | ||
|
|
||
| return scheduling.FromHeaders(memfileHeader, rootfsHeader) |
There was a problem hiding this comment.
Pause waits on memfile header
Medium Severity
NewSnapshotSchedulingMetadata blocks on memfileDiffHeader.WaitWithContext during snapshot creation, so Pause/Checkpoint cannot finish until memfile dedup/header work completes. That reverses the async memfile header path meant to let pause return before compare finishes, lengthening the RPC critical path when dedup is enabled.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit d48f099. Configure here.
| }) | ||
| if len(chain) > chainLimit { | ||
| chain = chain[:chainLimit] | ||
| } |
There was a problem hiding this comment.
Chain cap is thirty two
Medium Severity
Build chain truncation uses chainLimit 32, while the intended hard maximum discussed for this change is 128. Placement can drop lineage entries earlier than the agreed cap, skewing scheduling metadata for deep build chains.
Reviewed by Cursor Bugbot for commit d48f099. Configure here.



Expose compact scheduling metadata from orchestrator and template-manager without DB or public API changes.